Trying out community detection. At this point can only do this on the STRING based weighted edges generated in this notebook.



In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# Some nice default configuration for plots
plt.rcParams['figure.figsize'] = 10, 7.5
plt.rcParams['axes.grid'] = True
plt.gray()









    





<matplotlib.figure.Figure at 0x30f9ad0>



In [2]:

    
import pandas as pd



In [3]:

    
cd ../../forGAVIN/mergecode/OUT/









    



/data/opencast/MRes/forGAVIN/mergecode/OUT



In [4]:

    
!git annex unlock ../../../HBP/testdata/edgelist_update_weighted.txt









    



unlock ../../../HBP/testdata/edgelist_update_weighted.txt (copying...) ok



In [5]:

    
cp edgelist_update_weighted.txt ../../../HBP/testdata/



In [6]:

    
cd ../../../HBP/









    



/data/opencast/MRes/HBP

Community detection code from Colin written in C++:



In [292]:

    
!make









    



Building with option-useomp
g++     -DUSEOMP -O3 -fopenmp -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations -Wno-deprecated -g3 -o run communityDetection.C

This is how to run it:



In [7]:

    
!./run









    



./run requires 2 arguments: 
-file         : the network file to run 
-quite        : Turn on/off comments to command line
              : 0 = switch on comments to command line
              : 1 = switch off comments to command line
-algorithm    : the type of algorithm to run
              : 1 = Geodesic edge Betweenness
              : 2 = Random edge Betweenness
              : 3 = Spectral Betweenness
-weighted     : specify if network file is *weighted or not: 
              : 0  = Using a non-weighted network file 
              : 1  = Using a     weighted network file 
-enrichment   : run the enrichment on the clustered network
              : 0 = run enrichment after clustering - Default
              : 1 = run enrichment without clustering
              : 2 = do not run enrichment
-----
*             : The structure of the network file is:
-weighted 1   : A (interactor) 	 B (interactor) 	 W (weight) 
-weighted 0   : A (interactor) 	 B (interactor) 
              : Where A and B are integers (Entrez IDs) and W is a double.
EXAMPLE 1
-----
./run -file testData/edgelist.txt -weighted 0 -algorithm 1
-----
EXAMPLE 2
-----
./run -file testData/karate.txt -weighted 1 -algorithm 1
-----

Output is written to the OUT file, so we have to unlock them first.



In [7]:

    
!git annex unlock OUT/*









    



unlock OUT/communities_cytoscape.txt (copying...) ok
unlock OUT/community-metric_nmi.txt (copying...) ok
unlock OUT/communityout.txt (copying...) ok
unlock OUT/consensusout.txt (copying...) ok
unlock OUT/enrichment_SCH_enriched.csv (copying...) ok
unlock OUT/enrichment_disease.csv (copying...) ok
unlock OUT/network.png (copying...) ok
unlock OUT/overlap_dis_SCHenriched.csv (copying...) ok
unlock OUT/p_significance_tests_SCH_enriched.csv (copying...) ok
unlock OUT/p_significance_tests_disease.csv (copying...) ok
unlock OUT/p_significance_tests_summary_SCH_enriched.csv (copying...) ok
unlock OUT/p_significance_tests_summary_disease.csv (copying...) ok
unlock OUT/p_value_network_SCH_enriched.csv (copying...) ok
unlock OUT/p_value_network_disease.csv (copying...) ok
unlock OUT/p_values_SCH_enriched.csv (copying...) ok
unlock OUT/p_values_disease.csv (copying...) ok
unlock OUT/permute_p_values_SCH_enriched.csv (copying...) ok
unlock OUT/permute_p_values_disease.csv (copying...) ok
unlock OUT/removededges.txt (copying...) ok

Unweighted case

Running the code on the unweighted case to begin with:



In [8]:

    
!./run -file testdata/edgelist_update_weighted.txt -weighted 0 -algorithm 3 -quite 1









    



> ---- START ---- 
> Using Spectral Betweenness.
> Using a non-weighted network 
> Scanning network_file: testdata/edgelist_update_weighted.txt
> Header: None 
> Running Spectral Betweenness.
--- DONE ---
> cputime: 275.53 seconds 
> Network (nodes): 1572 (edges): 9373
> ---- END ----

This file describes the communities produced:



In [9]:

    
!head -n 10 OUT/communityout.txt









    



communityout
Max Q: 0.36175
cputime: 275.53 seconds 
Network (nodes): 1572 (edges): 9373
we 0 dist 0 community 1 Degree 1 visited 0 key 1315 (55268)
community: 1 size: 1
we 0 dist 0 community 2 Degree 4 visited 0 key 3 (18)
we 0 dist 0 community 2 Degree 5 visited 0 key 50 (372)
we 0 dist 0 community 2 Degree 5 visited 0 key 62 (481)
we 0 dist 0 community 2 Degree 2 visited 0 key 63 (482)

Weighted case

This is the case using the weights generated using the various data sources available:



In [10]:

    
!git annex unlock unweightedOUT/*









    



unlock unweightedOUT/communities_cytoscape.txt (copying...) ok
unlock unweightedOUT/community-metric_nmi.txt (copying...) ok
unlock unweightedOUT/communityout.txt (copying...) ok
unlock unweightedOUT/consensusout.txt (copying...) ok
unlock unweightedOUT/enrichment_SCH_enriched.csv (copying...) ok
unlock unweightedOUT/enrichment_disease.csv (copying...) ok
unlock unweightedOUT/network.png (copying...) ok
unlock unweightedOUT/overlap_dis_SCHenriched.csv (copying...) ok
unlock unweightedOUT/p_significance_tests_SCH_enriched.csv (copying...) ok
unlock unweightedOUT/p_significance_tests_disease.csv (copying...) ok
unlock unweightedOUT/p_significance_tests_summary_SCH_enriched.csv (copying...) ok
unlock unweightedOUT/p_significance_tests_summary_disease.csv (copying...) ok
unlock unweightedOUT/p_value_network_SCH_enriched.csv (copying...) ok
unlock unweightedOUT/p_value_network_disease.csv (copying...) ok
unlock unweightedOUT/p_values_SCH_enriched.csv (copying...) ok
unlock unweightedOUT/p_values_disease.csv (copying...) ok
unlock unweightedOUT/permute_p_values_SCH_enriched.csv (copying...) ok
unlock unweightedOUT/permute_p_values_disease.csv (copying...) ok
unlock unweightedOUT/removededges.txt (copying...) ok



In [11]:

    
cp OUT/* unweightedOUT/



In [12]:

    
!./run -file testdata/edgelist_update_weighted.txt -weighted 1 -algorithm 3 -quite 1









    



> ---- START ---- 
> Using Spectral Betweenness.
> Using a weighted network 
> Scanning network_file: testdata/edgelist_update_weighted.txt
> Header: None 
> Running Spectral Betweenness.
--- DONE ---
> cputime: 279.53 seconds 
> Network (nodes): 1572 (edges): 9373
> ---- END ----

Normalised Mutual Information test

We can provide a measure of the similarity of the communities detected using a Normalised Mutual Information test. As wikipedia puts it:

mutual information measures the information that X and Y share: it measures how much knowing one of these variables reduces uncertainty about the other.

Scikit-learn provides this test, all that's required is to load the data into Python.



In [13]:

    
import csv



In [14]:

    
def parsecommunityout(fname):
    f = open(fname)
    c = csv.reader(f,delimiter=" ")
    communitydict = {}
    for l in c:
        if l[0] == "we":
            communitydict[l[-1].strip("()")] = int(l[5])
    sortedids = sorted(communitydict.keys(),key=int)
    communities = map(communitydict.__getitem__,sortedids)
    return sortedids,communities



In [15]:

    
wids,wcom = parsecommunityout("OUT/communityout.txt")



In [16]:

    
uwids,uwcom = parsecommunityout("unweightedOUT/communityout.txt")



In [17]:

    
import sklearn.metrics



In [18]:

    
sklearn.metrics.normalized_mutual_info_score(wcom,uwcom)









    Out[18]:





0.60407676187875536

But, this result might not be correct, as the class labels sorting the nodes into communities are arbitrary.

Again, the below code is not mine:



In [19]:

    
from __future__ import division
import collections, re, math, sys, csv



In [20]:

    
#--- helper function
def convertStr(s):
    ret = int(s)
    return ret



In [21]:

    
rows, new, dic = [], [], []
consensus = "OUT/communities_cytoscape.txt"
community = "unweightedOUT/communities_cytoscape.txt"

#--- readin first community file
with open(consensus, 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter='=')
    headerline = reader.next();
    for row in reader:
        new.append(row)

#--- readin second community file
with open(community, 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter='=')
    headerline = reader.next();
    for row in reader:
        dic.append(row)



In [22]:

    
!git annex unlock OUT/community-metric_nmi.txt



In [23]:

    
for i in dic:
    for j in new:
        if i[0] == j[0]:
            i.append(j[1])
dic.sort(key=lambda x: int(x[2]))

#--- count of nodes in each community in reality and in algorithm
realc, algc = [], []
for i in dic:
    realc.append(convertStr(i[1]))
    algc.append(convertStr(i[2]))
comcount = dict([(i,realc.count(i)) for i in set(realc)])
pcount = dict([(i,algc.count(i)) for i in set(algc)])

#--- holds the algorithm community label with the corresponding community labels
#    provided e.g., {1:{2:2,3:3}}
algcount, tempd = {}, {}
templist = []
comtrack = 0
for key, val in pcount.iteritems():
    templist = realc[comtrack:comtrack+val]
    comtrack += val
    for i in set(templist):
        tempd[i] = templist.count(i)
    algcount[key] = tempd
    tempd = {}

#--- finding the node with the maximum label in that community
#    key = int, value = dictionary
label = {}
for key, value in algcount.iteritems():
    label[key] = max(value, key = value.get)
for i in dic:
    for j in label:
        if (int)(i[2]) == j:
            i.append(label[j])

#--- METRICS    
#--- Calculate the NMI
nt, np, ntp = {}, {}, {}
NMI = 0
nodes = len(dic)
for i in range(1, max(comcount)+1):
    for j in range(1, max(pcount)+1):
        nt[i] = 0
        np[j] = 0
        ntp[(str(i)+' '+str(j))] = 0
nt = comcount
np = pcount
for i in dic:
    ntp[(str(i[1]).strip()+' '+str(i[2]).strip())] += 1
num, den = 0, 0
for i, t in nt.iteritems():
    for j, p in np.iteritems():
        temp, temp2 = 0, 0
        if (ntp[(str(i).strip()+' '+str(j).strip())] > 0) and (t > 0) and (p > 0):
            temp = ((ntp[(str(i).strip()+' '+str(j).strip())]*nodes)/(t*p))
            temp2 = ntp[(str(i).strip()+' '+str(j).strip())]* math.log(temp)
        num += temp2
d1, d2 = 0, 0
for t in nt.values():
        d1 += t * math.log(t/nodes)
for p in np.values():
        d2 += p * math.log(p/nodes)
den = d1 + d2
NMI = -(2*num)/den


#--- print results to screen
print 'NMI = %s' %(NMI)
f = open ( 'OUT/community-metric_nmi.txt', 'w' )
f.writelines("node\treal\talg\tnewlabel\n")
f.writelines("%s\t%s\t%s\t%s\n" % (i[0],i[1],i[2],i[3]) for i in dic)
f.writelines('NMI = %s' %(NMI))
f.close()









    



NMI = 0.604022794092

So the result is almost exactly the same, but the above code appears to take into account the problem with arbitrary community labels. Good to see both methods getting similar results, in any case.

Disease Enrichment

Having a look at the disease enrichment now:



In [24]:

    
dew =  pd.read_csv("OUT/permute_p_values_disease.csv", delimiter="\t")
deuw = pd.read_csv("unweightedOUT/permute_p_values_disease.csv", delimiter="\t")

Check papers to read about the hypergeometric disease enrichment test:



In [25]:

    
dew.head()









    Out[25]:






  
    
      
      Community
      Size
      disease_of_metabolism
      p-value
      {p-value}
      Sig. Level
      p_lower (%)
      p_upper (%)
      disease_by_infectious_agent
      p-value.1
      ...
      Sig. Level.135
      p_lower (%).135
      p_upper (%).135
      muscular_dystrophy
      p-value.136
      {p-value}.136
      Sig. Level.136
      p_lower (%).136
      p_upper (%).136
      Unnamed: 824
    
  
  
    
      0
       1
       50
      NaN
       0.125016
       0.541943
       0.010459
       12.5
       93.0
      NaN
       0.837946
      ...
       0.015763
        45.7
       89.0
      NaN
       0.598702
       0.659976
       0.015500
        59.9
       76.4
      NaN
    
    
      1
       2
       48
      NaN
       0.761838
       0.562907
       0.013470
       74.6
       39.4
      NaN
       0.989202
      ...
       0.000000
       100.0
       55.6
      NaN
       0.051262
       0.671740
       0.006974
         5.2
       99.6
      NaN
    
    
      2
       3
       26
      NaN
       0.574356
       0.576412
       0.015636
       58.3
       64.3
      NaN
       0.406454
      ...
       0.014087
        28.9
       95.9
      NaN
       1.000000
       0.731936
       0.000000
       100.0
       61.4
      NaN
    
    
      3
       4
       48
      NaN
       0.877369
       0.564046
       0.010373
       88.7
       23.8
      NaN
       0.033306
      ...
       0.015723
        43.4
       88.9
      NaN
       0.009142
       0.668188
       0.003010
         1.1
       99.7
      NaN
    
    
      4
       5
       26
      NaN
       0.574356
       0.610789
       0.015636
       52.1
       70.0
      NaN
       0.812738
      ...
       0.000000
       100.0
       73.9
      NaN
       0.375665
       0.743471
       0.015315
        37.1
       91.9
      NaN
    
  

5 rows × 825 columns



In [26]:

    
deuw.head()









    Out[26]:






  
    
      
      Community
      Size
      disease_of_metabolism
      p-value
      {p-value}
      Sig. Level
      p_lower (%)
      p_upper (%)
      disease_by_infectious_agent
      p-value.1
      ...
      Sig. Level.135
      p_lower (%).135
      p_upper (%).135
      muscular_dystrophy
      p-value.136
      {p-value}.136
      Sig. Level.136
      p_lower (%).136
      p_upper (%).136
      Unnamed: 824
    
  
  
    
      0
       1
        1
      NaN
       1.000000
       0.880437
       0.000000
       100.0
       85.9
      NaN
       1.000000
      ...
       0.000000
       100.0
       98.9
      NaN
       1.000000
       0.983303
       0.000000
       100
       98.3
      NaN
    
    
      1
       2
       99
      NaN
       0.765480
       0.529656
       0.013399
        76.6
       32.3
      NaN
       0.970181
      ...
       0.005178
         2.3
       99.6
      NaN
       0.535656
       0.606840
       0.015771
        55
       74.1
      NaN
    
    
      2
       3
       37
      NaN
       0.684769
       0.560950
       0.014692
        69.7
       48.4
      NaN
       0.009547
      ...
       0.015230
        37.4
       93.7
      NaN
       1.000000
       0.686668
       0.000000
       100
       49.4
      NaN
    
    
      3
       4
       17
      NaN
       0.490462
       0.594665
       0.015808
        48.8
       73.4
      NaN
       0.783821
      ...
       0.012346
        19.0
       98.3
      NaN
       1.000000
       0.784876
       0.000000
       100
       71.6
      NaN
    
    
      4
       5
       17
      NaN
       0.940347
       0.603658
       0.007490
        94.1
       24.9
      NaN
       0.949553
      ...
       0.000000
       100.0
       79.9
      NaN
       1.000000
       0.797138
       0.000000
       100
       73.2
      NaN
    
  

5 rows × 825 columns

How to interpret these results? I suppose we would like to split it by the different diseases and then detect those with low p-values, probably by sorting them. All of the disease columns are regularly spaced, so should be easy to split into different dataframes:



In [27]:

    
def splitbydisease(diseasedataframe):
    diseasedict = {}
    clabels = diseasedataframe.iloc[:,[0,1,3,4,5,6,7]].columns
    for offset in range(2,diseasedataframe.columns.shape[0]-1,6):
        diseaseindexes = [0,1] + range(offset+1,offset+6)
        diseasedict[diseasedataframe.columns[offset]]=diseasedataframe.iloc[:,diseaseindexes]
        #fix the column names
        diseasedict[diseasedataframe.columns[offset]].columns = clabels
    return diseasedict



In [28]:

    
deuwdict = splitbydisease(deuw)



In [29]:

    
dewdict = splitbydisease(dew)

Now we can search for diseases in each of these communities with p-values less than 5%.



In [30]:

    
def significantdiseases(diseasedict):
    sigdiseasedict = {}
    for k in diseasedict:
        pvalues = diseasedict[k].iloc[:,2]
        pvalues = pvalues[pvalues<0.05]
        sigdiseasedict[k] = diseasedict[k].iloc[pvalues.index]
    return sigdiseasedict



In [31]:

    
sigdeuwdict = significantdiseases(deuwdict)



In [32]:

    
sigdewdict = significantdiseases(dewdict)

Now we want to iterate through the diseases and sort the lowest p-values, while annotating which disease and community is involved. Should probably build a new dataframe and put this information in it.

To keep the analysis simple, we only want to consider schizophrenia and alzheimers.



In [33]:

    
interestdiseases = ["Alzheimer's_disease",'schizophrenia']



In [34]:

    
for k in sigdeuwdict:
    sigdeuwdict[k].insert(2,'disease',k)
pieces = map(lambda k: sigdeuwdict[k],interestdiseases)



In [35]:

    
sigunweighted = pd.concat(pieces)



In [36]:

    
for k in sigdewdict:
    sigdewdict[k].insert(2,'disease',k)
pieces = map(lambda k: sigdewdict[k],interestdiseases)



In [37]:

    
sigweighted = pd.concat(pieces)



In [38]:

    
sigunweighted.sort(columns='p-value')









    Out[38]:






  
    
      
      Community
      Size
      disease
      p-value
      {p-value}
      Sig. Level
      p_lower (%)
      p_upper (%)
    
  
  
    
      28
       29
       88
             schizophrenia
       0.001691
       0.549829
       0.001299
       0.2
       99.9
    
    
      1 
        2
       99
             schizophrenia
       0.004321
       0.547038
       0.002074
       0.5
       99.9
    
    
      29
       30
       26
       Alzheimer's_disease
       0.005374
       0.597882
       0.002312
       0.7
       99.8
    
    
      60
       61
       24
             schizophrenia
       0.019591
       0.562575
       0.004383
       1.5
       99.8
    
    
      62
       63
       23
       Alzheimer's_disease
       0.035030
       0.573961
       0.005814
       4.3
       98.2
    
    
      60
       61
       24
       Alzheimer's_disease
       0.042531
       0.593381
       0.006381
       4.5
       98.2
    
    
      62
       63
       23
             schizophrenia
       0.046159
       0.589334
       0.006635
       4.3
       98.5



In [39]:

    
sigweighted.sort(columns='p-value')









    Out[39]:






  
    
      
      Community
      Size
      disease
      p-value
      {p-value}
      Sig. Level
      p_lower (%)
      p_upper (%)
    
  
  
    
      43
       44
       35
             schizophrenia
       0.003446
       0.564032
       0.001853
       0.6
        99.7
    
    
      44
       45
       73
             schizophrenia
       0.003900
       0.528577
       0.001971
       0.4
        99.9
    
    
      47
       48
       17
       Alzheimer's_disease
       0.007570
       0.619867
       0.002741
       0.2
       100.0
    
    
      32
       33
       64
       Alzheimer's_disease
       0.008390
       0.563842
       0.002884
       0.7
        99.5
    
    
      32
       33
       64
             schizophrenia
       0.023347
       0.541118
       0.004775
       2.2
        98.8
    
    
      13
       14
       18
             schizophrenia
       0.041873
       0.590369
       0.006334
       3.2
        99.5
    
    
      1 
        2
       48
             schizophrenia
       0.046719
       0.563059
       0.006674
       4.9
        98.0
    
    
      14
       15
        8
       Alzheimer's_disease
       0.049793
       0.666429
       0.006879
       5.2
        99.2

Now all that's required is to retreive the IDs of the proteins in each of these communities to compare the different disease associated communities. Wait, Cytoscape already does that for me. I can just use that.

Writing these results to a file so I can discuss them later:



In [40]:

    
!git annex unlock unweighted_significant_disease_communities.tsv









    



unlock unweighted_significant_disease_communities.tsv (copying...) ok



In [41]:

    
sigunweighted.sort(columns='p-value').to_csv("unweighted_significant_disease_communities.tsv",
                                             sep="\t",index=False)



In [42]:

    
!git annex unlock weighted_significant_disease_communities.tsv









    



unlock weighted_significant_disease_communities.tsv (copying...) ok



In [43]:

    
sigweighted.sort(columns='p-value').to_csv("weighted_significant_disease_communities.tsv",
                                           sep="\t",index=False)



In [44]:

    
!head unweighted_significant_disease_communities.tsv









    



Community	Size	disease	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)
29	88	schizophrenia	0.00169105	0.549829	0.0012993	0.2	99.9
2	99	schizophrenia	0.00432129	0.547038	0.00207427	0.5	99.9
30	26	Alzheimer's_disease	0.00537398	0.597882	0.00231195	0.7	99.8
61	24	schizophrenia	0.019590700000000003	0.562575	0.00438256	1.5	99.8
63	23	Alzheimer's_disease	0.0350298	0.5739609999999999	0.00581401	4.3	98.2
61	24	Alzheimer's_disease	0.0425309	0.593381	0.00638138	4.5	98.2
63	23	schizophrenia	0.046159399999999996	0.589334	0.00663542	4.3	98.5

Crossover between weighted and unweighted

The obvious question to ask is what is the crossover between the high confidence disease communities in each of these groupings. We can find this by comparing the members of each of these communities. While doing the NMI test we loaded in these communities, so we can access them and compare them easily.



In [45]:

    
weightedcommunities = {}
for l in zip(*parsecommunityout("OUT/communityout.txt")):
    try:
        weightedcommunities[l[1]] += [l[0]]
    except KeyError:
        weightedcommunities[l[1]] = [l[0]]



In [46]:

    
unweightedcommunities = {}
for l in zip(*parsecommunityout("unweightedOUT/communityout.txt")):
    try:
        unweightedcommunities[l[1]] += [l[0]]
    except KeyError:
        unweightedcommunities[l[1]] = [l[0]]

Unweighted community 29

The most likely disease association in the above unweighted table is community 29 with a size of 88. Comparing this to find the closest match in the weighted case:



In [47]:

    
def findsimilarcom(community,communitydict):
    overlapdict = {}
    for k in communitydict:
        #make two lists of inclusion/exclusion for all shared ids
        uwids = community
        wids = communitydict[k]
        tids = set(uwids+wids)
        winc,uwinc = [],[]
        for i in tids:
            winc.append(int(i in wids))
            uwinc.append(int(i in uwids))
        overlap = sum(map(lambda a,b: int(a==b), uwinc,winc))/len(tids)
        if overlap > 0.0:
            overlapdict[k] = overlap
    return overlapdict



In [48]:

    
findsimilarcom(unweightedcommunities[29],weightedcommunities)









    Out[48]:





{2: 0.014925373134328358,
 7: 0.0086956521739130436,
 13: 0.0058479532163742687,
 24: 0.031914893617021274,
 26: 0.0059880239520958087,
 30: 0.011363636363636364,
 31: 0.011363636363636364,
 32: 0.019047619047619049,
 33: 0.44761904761904764,
 40: 0.052631578947368418,
 41: 0.008771929824561403,
 43: 0.011363636363636364,
 44: 0.0081967213114754103,
 45: 0.080536912751677847,
 46: 0.030534351145038167,
 48: 0.0096153846153846159,
 54: 0.02,
 83: 0.008771929824561403}

So many of these proteins are spread around smaller other communities in the weighted case, but most are found in another community which overlaps 31%.



In [49]:

    
len(weightedcommunities[33]),len(unweightedcommunities[29])









    Out[49]:





(64, 88)

Investigating the effect of the weights. Would like to know which weighting might have caused the community to be split like this. We can inspect the graphs more easily by plotting them:



In [50]:

    
#load the edges of the graph
interactions = loadtxt("testdata/edgelist_update_weighted.txt",dtype=str)
interactions = interactions[1:]



In [51]:

    
import networkx as nx



In [52]:

    
def plotcommunities(com1,com2):
    G = nx.Graph()
    for l in interactions:
        if l[0] in set(com1+com2) and l[1] in set(com1+com2):
            G.add_edge(l[0],l[1],weight=float(l[2]))
    edict = {}
    lim = min([d['weight'] for (u,v,d) in G.edges(data=True)])
    diff = linspace(lim,1.0,10)[1]- linspace(lim,1.0,10)[0]
    for x in linspace(lim,1.0,10):
        edict[x] = [(u,v) for (u,v,d) in G.edges(data=True) if d['weight'] > x and d['weight'] < x+diff]
    pos = nx.circular_layout(com1)
    pos2 = nx.circular_layout(set(com2)-set(com1))
    for k in pos2:
        pos2[k] = array([pos2[k][0]+1.5,pos2[k][1]])
    pos = dict(pos.items()+pos2.items())
    nx.draw_networkx_nodes(G,pos,node_size=20,alpha=0.5)
    for k in edict:
        nx.draw_networkx_edges(G,pos,edgelist=edict[k],alpha=(k-lim)*(1/(1-lim)),edge_color='r')
        #nx.draw_networkx_edges(G,pos,edgelist=edict[k],edge_color='r')
    l=nx.draw_networkx_labels(G,pos,font_size=10,font_family='sans-serif')
    return None



In [53]:

    
plotcommunities(weightedcommunities[33],unweightedcommunities[29])



In [80]:

    
import os



In [81]:

    
plotpath = os.path.abspath("../plots/bayes/")



In [82]:

    
savez(os.path.join(plotpath,"nx2933.npz"),unweightedcommunities[29],weightedcommunities[33])



In [54]:

    
plotcommunities(unweightedcommunities[29],weightedcommunities[33])

Average weighting of the edges within each community:



In [56]:

    
def averageweight(community):
    weights = []
    for l in interactions:
        if l[0] in community and l[1] in community:
            weights.append(float(l[2]))
    return average(weights)



In [57]:

    
averageweight(weightedcommunities[33])









    Out[57]:





0.89165857738239485



In [58]:

    
averageweight(unweightedcommunities[29])









    Out[58]:





0.89599059455826779

Weighted community 44

Now investigating a high-likelihood community from the weighted case:



In [59]:

    
s=findsimilarcom(weightedcommunities[44],unweightedcommunities)
s









    Out[59]:





{2: 0.015151515151515152,
 29: 0.0081967213114754103,
 30: 0.016666666666666666,
 63: 0.11538461538461539,
 64: 0.48717948717948717,
 69: 0.029999999999999999,
 77: 0.011627906976744186,
 84: 0.019230769230769232}

So the proteins involved are split amount some larger communities in the unweighted case, mostly in community 6. We can plot these:



In [60]:

    
plotcommunities(weightedcommunities[44],unweightedcommunities[64])



In [83]:

    
savez(os.path.join(plotpath,"nx6444.npz"),unweightedcommunities[64],weightedcommunities[44])



In [61]:

    
plotcommunities(unweightedcommunities[64],weightedcommunities[44])

Schizophrenia functional annotations

The Disease Enrichment test also produces a similar table annotating functions that are associated with Schizophrenia. We can repeat the above parsing to find the likely ones of this:



In [62]:

    
schw = pd.read_csv("OUT/permute_p_values_SCH_enriched.csv", delimiter="\t")
schuw = pd.read_csv("unweightedOUT/permute_p_values_SCH_enriched.csv", delimiter="\t")



In [63]:

    
sigschwdict = significantdiseases(splitbydisease(schw))
sigschuwdict = significantdiseases(splitbydisease(schuw))



In [64]:

    
for k in sigschwdict:
    sigschwdict[k].insert(2,'disease',k)
pieces = map(lambda k: sigschwdict[k],sigschwdict)



In [65]:

    
sigschweighted = pd.concat(pieces)



In [66]:

    
for k in sigschuwdict:
    sigschuwdict[k].insert(2,'disease',k)
pieces = map(lambda k: sigschuwdict[k],sigschuwdict)



In [67]:

    
sigschunweighted = pd.concat(pieces)



In [68]:

    
sigschweighted.sort(columns='p-value')









    Out[68]:






  
    
      
      Community
      Size
      disease
      p-value
      {p-value}
      Sig. Level
      p_lower (%)
      p_upper (%)
    
  
  
    
      25
       26
        80
                             Endocytosis
      -6.661340e-16
       0.638668
                NaN
       0.0
       100.0
    
    
      44
       45
        73
                              Exocytosis
       2.220450e-16
       0.607270
       4.712160e-10
       0.0
       100.0
    
    
      39
       40
        32
                           CAT_signaling
       6.416100e-10
       0.672824
       8.010050e-07
       0.0
       100.0
    
    
      45
       46
        47
                              Exocytosis
       6.159730e-09
       0.632842
       2.481880e-06
       0.0
       100.0
    
    
      45
       46
        47
               Intracellular_trafficking
       1.953090e-07
       0.663362
       1.397530e-05
       0.0
       100.0
    
    
      23
       24
         9
                           CAT_signaling
       6.450880e-07
       0.837461
       2.539860e-05
       0.0
       100.0
    
    
      32
       33
        64
                            Excitability
       1.881530e-06
       0.694875
       4.337650e-05
       0.0
       100.0
    
    
      53
       54
        14
               Intracellular_trafficking
       5.222120e-06
       0.803925
       7.226400e-05
       0.0
       100.0
    
    
      44
       45
        73
             Neurotransmitter_metabolism
       9.475210e-06
       0.763295
       9.734020e-05
       0.0
       100.0
    
    
      7 
        8
        53
                         Cell_metabolism
       1.138280e-05
       0.659110
       1.066900e-04
       0.0
       100.0
    
    
      3 
        4
        48
                   Structural_plasticity
       1.576900e-05
       0.603017
       1.255740e-04
       0.0
       100.0
    
    
      39
       40
        32
                          LGIC_signaling
       1.663450e-05
       0.856139
       1.289740e-04
       0.0
       100.0
    
    
      1 
        2
        48
                         Protein_cluster
       4.553190e-05
       0.654236
       2.133770e-04
       0.0
       100.0
    
    
      32
       33
        64
                                 Unknown
       4.553640e-05
       0.686648
       2.133880e-04
       0.0
       100.0
    
    
      77
       78
        15
                         G-protein_relay
       8.115600e-05
       0.900729
       2.848670e-04
       0.0
       100.0
    
    
      45
       46
        47
                   Ion_balance/transport
       1.457220e-04
       0.671435
       3.817070e-04
       0.0
       100.0
    
    
      46
       47
         5
                         G-protein_relay
       3.607280e-04
       0.970943
       6.004980e-04
       0.0
       100.0
    
    
      49
       50
        19
                         Protein_cluster
       5.627940e-04
       0.762330
       7.499850e-04
       0.0
       100.0
    
    
      12
       13
        84
                                   RPSFB
       6.244790e-04
       0.619736
       7.899930e-04
       0.0
       100.0
    
    
      44
       45
        73
                         Protein_cluster
       6.846600e-04
       0.601769
       8.271590e-04
       0.2
       100.0
    
    
      44
       45
        73
                           CAT_signaling
       1.170980e-03
       0.624930
       1.081490e-03
       0.1
       100.0
    
    
      43
       44
        35
                          GPCR_signaling
       1.425310e-03
       0.941988
       1.193010e-03
       0.1
       100.0
    
    
      36
       37
         2
       Intracellular_Signal_Transduction
       2.369600e-03
       0.901327
       1.537530e-03
       0.1
       100.0
    
    
      5 
        6
        45
                         Cell_metabolism
       2.620530e-03
       0.644941
       1.616680e-03
       0.2
       100.0
    
    
      1 
        2
        48
                         G-protein_relay
       2.759550e-03
       0.794211
       1.658900e-03
       0.2
       100.0
    
    
      17
       18
       120
                   Structural_plasticity
       3.187110e-03
       0.565784
       1.782400e-03
       0.3
       100.0
    
    
      23
       24
         9
                            Excitability
       3.791720e-03
       0.920861
       1.943540e-03
       0.3
       100.0
    
    
      32
       33
        64
                         Protein_cluster
       1.065260e-02
       0.631767
       3.246400e-03
       1.3
        99.7
    
    
      29
       30
         1
                            Excitability
       1.081420e-02
       0.985162
       3.270670e-03
       1.5
       100.0
    
    
      42
       43
         1
                            Excitability
       1.081420e-02
       0.988130
       3.270670e-03
       1.2
       100.0
    
    
      17
       18
       120
       Intracellular_Signal_Transduction
       1.149320e-02
       0.550901
       3.370620e-03
       1.4
        99.6
    
    
      14
       15
         8
                         Cell_metabolism
       1.240010e-02
       0.865700
       3.499480e-03
       1.0
        99.7
    
    
      52
       53
        31
       Intracellular_Signal_Transduction
       1.544150e-02
       0.630677
       3.899110e-03
       1.7
        99.8
    
    
      43
       44
        35
                          LGIC_signaling
       1.572140e-02
       0.842228
       3.933730e-03
       1.6
        99.9
    
    
      34
       35
         3
                          LGIC_signaling
       1.708820e-02
       0.985256
       4.098320e-03
       1.5
       100.0
    
    
      52
       53
        31
                   Ion_balance/transport
       1.793900e-02
       0.705077
       4.197290e-03
       2.0
        99.9
    
    
      18
       19
        12
                   Ion_balance/transport
       1.934710e-02
       0.833093
       4.355780e-03
       1.7
       100.0
    
    
      73
       74
         1
                         Cell_metabolism
       2.226460e-02
       0.978490
       4.665720e-03
       2.2
       100.0
    
    
      38
       39
         2
                                 Unknown
       2.277690e-02
       0.977524
       4.717850e-03
       2.3
       100.0
    
    
      4 
        5
        26
                                   RPSFB
       2.309210e-02
       0.715671
       4.749620e-03
       1.8
        99.9
    
    
      1 
        2
        48
                           CAT_signaling
       2.411220e-02
       0.637163
       4.850860e-03
       1.7
        99.9
    
    
      43
       44
        35
                   Ion_balance/transport
       2.486020e-02
       0.711774
       4.923630e-03
       1.7
        99.8
    
    
      43
       44
        35
       Intracellular_Signal_Transduction
       2.532140e-02
       0.610452
       4.967920e-03
       2.1
        99.5
    
    
      15
       16
        17
                   Structural_plasticity
       3.152940e-02
       0.672475
       5.525880e-03
       4.3
        99.8
    
    
      32
       33
        64
       Intracellular_Signal_Transduction
       3.348920e-02
       0.575023
       5.689260e-03
       3.8
        98.7
    
    
      12
       13
        84
                         Cell_metabolism
       3.535680e-02
       0.614940
       5.840100e-03
       3.4
        99.3
    
    
      49
       50
        19
               Tyrosine_kinase_signaling
       3.584560e-02
       0.960434
       5.878830e-03
       4.1
        99.9
    
    
      5 
        6
        45
                   Structural_plasticity
       3.724010e-02
       0.592989
       5.987760e-03
       3.8
        99.4
    
    
      47
       48
        17
                   Ion_balance/transport
       3.765970e-02
       0.804750
       6.020090e-03
       3.5
        99.8
    
    
      21
       22
         1
                   Structural_plasticity
       4.198470e-02
       0.966469
       6.342080e-03
       3.5
       100.0
    
    
      81
       82
        17
       Intracellular_Signal_Transduction
       4.681000e-02
       0.671723
       6.679730e-03
       5.7
        99.1
    
    
      61
       62
         1
       Intracellular_Signal_Transduction
       4.898220e-02
       0.950547
       6.825170e-03
       5.2
       100.0
    
    
      72
       73
         1
       Intracellular_Signal_Transduction
       4.898220e-02
       0.956253
       6.825170e-03
       4.6
       100.0



In [69]:

    
sigschunweighted.sort(columns='p-value')









    Out[69]:






  
    
      
      Community
      Size
      disease
      p-value
      {p-value}
      Sig. Level
      p_lower (%)
      p_upper (%)
    
  
  
    
      1 
        2
       99
                              Exocytosis
      -6.661340e-16
       0.596752
                NaN
       0.0
       100.0
    
    
      68
       69
       68
                             Endocytosis
      -4.440890e-16
       0.691935
                NaN
       0.0
       100.0
    
    
      60
       61
       24
                         G-protein_relay
       8.690600e-12
       0.873672
       9.322340e-08
       0.0
       100.0
    
    
      28
       29
       88
                            Excitability
       8.561880e-11
       0.662133
       2.926070e-07
       0.0
       100.0
    
    
      58
       59
       31
                           CAT_signaling
       4.482160e-10
       0.689979
       6.694890e-07
       0.0
       100.0
    
    
      72
       73
       13
                         Protein_cluster
       1.814150e-09
       0.816884
       1.346900e-06
       0.0
       100.0
    
    
      1 
        2
       99
               Intracellular_trafficking
       2.364650e-07
       0.597795
       1.537740e-05
       0.0
       100.0
    
    
      28
       29
       88
                         Protein_cluster
       8.989080e-07
       0.615212
       2.998180e-05
       0.0
       100.0
    
    
      16
       17
        5
                   Structural_plasticity
       1.376690e-05
       0.845397
       1.173320e-04
       0.0
       100.0
    
    
      3 
        4
       17
                         Cell_metabolism
       2.083410e-05
       0.764369
       1.443390e-04
       0.0
       100.0
    
    
      6 
        7
       92
                                   RPSFB
       3.503870e-05
       0.599114
       1.871830e-04
       0.0
       100.0
    
    
      1 
        2
       99
             Neurotransmitter_metabolism
       4.320860e-05
       0.722620
       2.078620e-04
       0.0
       100.0
    
    
      69
       70
        3
                   Structural_plasticity
       7.081230e-05
       0.904585
       2.660960e-04
       0.0
       100.0
    
    
      61
       62
       24
                                 Unknown
       1.111100e-04
       0.821977
       3.333130e-04
       0.0
       100.0
    
    
      71
       72
       44
                           CAT_signaling
       4.116570e-04
       0.656179
       6.414730e-04
       0.0
       100.0
    
    
      15
       16
       49
                   Structural_plasticity
       7.192340e-04
       0.580705
       8.477720e-04
       0.0
       100.0
    
    
      28
       29
       88
                          LGIC_signaling
       9.317240e-04
       0.737109
       9.648090e-04
       0.0
       100.0
    
    
      27
       28
       19
                                 Unknown
       1.090500e-03
       0.831836
       1.043700e-03
       0.0
       100.0
    
    
      54
       55
        2
       Intracellular_Signal_Transduction
       2.369600e-03
       0.914893
       1.537530e-03
       0.1
       100.0
    
    
      58
       59
       31
       Intracellular_Signal_Transduction
       3.135030e-03
       0.626750
       1.767820e-03
       0.1
        99.9
    
    
      22
       23
       71
                         Cell_metabolism
       3.932760e-03
       0.616346
       1.979220e-03
       0.0
       100.0
    
    
      62
       63
       23
       Intracellular_Signal_Transduction
       4.158490e-03
       0.645144
       2.034990e-03
       0.2
       100.0
    
    
      30
       31
        3
               Tyrosine_kinase_signaling
       5.717900e-03
       0.996023
       2.384370e-03
       0.4
       100.0
    
    
      26
       27
       33
                           CAT_signaling
       6.526000e-03
       0.688906
       2.546260e-03
       0.4
       100.0
    
    
      62
       63
       23
                          LGIC_signaling
       6.928630e-03
       0.884490
       2.623100e-03
       0.7
       100.0
    
    
      4 
        5
       17
                                   RPSFB
       7.016560e-03
       0.756378
       2.639570e-03
       0.5
       100.0
    
    
      62
       63
       23
                   Ion_balance/transport
       7.804610e-03
       0.726029
       2.782750e-03
       0.5
       100.0
    
    
      61
       62
       24
                   Ion_balance/transport
       8.809520e-03
       0.747960
       2.954980e-03
       1.4
       100.0
    
    
      58
       59
       31
                          LGIC_signaling
       1.243410e-02
       0.888333
       3.504200e-03
       0.9
        99.9
    
    
      67
       68
        1
                             Endocytosis
       1.335880e-02
       0.987174
       3.630470e-03
       1.3
       100.0
    
    
      28
       29
       88
                           CAT_signaling
       1.478950e-02
       0.600355
       3.817170e-03
       1.0
        99.8
    
    
      32
       33
        3
                          LGIC_signaling
       1.708820e-02
       0.981325
       4.098320e-03
       1.9
       100.0
    
    
      60
       61
       24
                           CAT_signaling
       1.728890e-02
       0.720771
       4.121890e-03
       1.5
       100.0
    
    
      77
       78
       11
               Intracellular_trafficking
       2.088610e-02
       0.834183
       4.522150e-03
       1.5
       100.0
    
    
      60
       61
       24
                              Exocytosis
       2.133330e-02
       0.709442
       4.569270e-03
       1.8
        99.9
    
    
      47
       48
        1
                         Cell_metabolism
       2.226460e-02
       0.981423
       4.665720e-03
       1.9
       100.0
    
    
      82
       83
        1
                         Cell_metabolism
       2.226460e-02
       0.976534
       4.665720e-03
       2.4
       100.0
    
    
      34
       35
        2
                                 Unknown
       2.277690e-02
       0.976547
       4.717850e-03
       2.4
       100.0
    
    
      13
       14
       95
                                   RPSFB
       2.376770e-02
       0.608652
       4.816930e-03
       2.2
        99.6
    
    
      66
       67
        1
                              Exocytosis
       2.544530e-02
       0.972712
       4.979740e-03
       2.8
       100.0
    
    
      19
       20
       87
                   Structural_plasticity
       2.600320e-02
       0.567603
       5.032600e-03
       2.6
        98.7
    
    
      22
       23
       71
                   Structural_plasticity
       2.619760e-02
       0.574707
       5.050870e-03
       2.1
        99.3
    
    
      13
       14
       95
       Intracellular_Signal_Transduction
       3.836680e-02
       0.568902
       6.074110e-03
       4.3
        98.0
    
    
      14
       15
       16
       Intracellular_Signal_Transduction
       3.992310e-02
       0.693810
       6.191060e-03
       3.2
        99.8
    
    
      62
       63
       23
                          GPCR_signaling
       4.328120e-02
       0.964559
       6.434900e-03
       3.7
        99.9
    
    
      63
       64
       23
                          GPCR_signaling
       4.328120e-02
       0.963602
       6.434900e-03
       3.8
        99.9
    
    
      68
       69
       68
             Neurotransmitter_metabolism
       4.361520e-02
       0.783897
       6.458550e-03
       4.1
        99.7
    
    
      60
       61
       24
                          GPCR_signaling
       4.513410e-02
       0.959851
       6.564830e-03
       4.2
        99.9
    
    
      14
       15
       16
                         Cell_metabolism
       4.753880e-02
       0.776354
       6.728960e-03
       3.3
        99.9
    
    
      43
       44
        1
       Intracellular_Signal_Transduction
       4.898220e-02
       0.942939
       6.825170e-03
       6.0
       100.0
    
    
      40
       41
        1
       Intracellular_Signal_Transduction
       4.898220e-02
       0.943890
       6.825170e-03
       5.9
       100.0



In [72]:

    
!git annex unlock unweighted_significant_SCH_communities.tsv









    



unlock unweighted_significant_SCH_communities.tsv (copying...) ok



In [73]:

    
!git annex unlock weighted_significant_SCH_communities.tsv









    



unlock weighted_significant_SCH_communities.tsv (copying...) ok



In [74]:

    
#save these tables
sigschunweighted.sort(columns='p-value').to_csv("unweighted_significant_SCH_communities.tsv",
                                             sep="\t",index=False)
sigschweighted.sort(columns='p-value').to_csv("weighted_significant_SCH_communities.tsv",
                                             sep="\t",index=False)

Function-Schizophrenia crossover

This file summarises the crossover between the functions and the disease to summarise which functions are most likely to be involved with the disease.



In [75]:

    
overlap = pd.read_csv("unweightedOUT/overlap_dis_SCHenriched.csv", sep="\t",header=None,names=range(139))



In [76]:

    
overlap = overlap.fillna('')

We are only interested in the overlap with schizophrenia, so we want to cut out just that column.



In [77]:

    
overlap = overlap.iloc[:,[0,1,108]]



In [78]:

    
with open("unweightedOUT/overlap_dis_SCHenriched.csv") as f:
    c = csv.reader(f,delimiter="\t")
    overlap = {}
    for l in c:
        try:
            if l[0] == 'community':
                currentcommunity = l[1]
            if len(l) > 108:
                if l[108] == '1':
                    try:
                        overlap[currentcommunity] += [l[0]]
                    except KeyError:
                        overlap[currentcommunity] = [l[0]]
        except:
            pass

Ok, so we now have access to this data.

	Community	Size	disease_of_metabolism	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)	disease_by_infectious_agent	p-value.1	...	Sig. Level.135	p_lower (%).135	p_upper (%).135	muscular_dystrophy	p-value.136	{p-value}.136	Sig. Level.136	p_lower (%).136	p_upper (%).136	Unnamed: 824
0	1	50	NaN	0.125016	0.541943	0.010459	12.5	93.0	NaN	0.837946	...	0.015763	45.7	89.0	NaN	0.598702	0.659976	0.015500	59.9	76.4	NaN
1	2	48	NaN	0.761838	0.562907	0.013470	74.6	39.4	NaN	0.989202	...	0.000000	100.0	55.6	NaN	0.051262	0.671740	0.006974	5.2	99.6	NaN
2	3	26	NaN	0.574356	0.576412	0.015636	58.3	64.3	NaN	0.406454	...	0.014087	28.9	95.9	NaN	1.000000	0.731936	0.000000	100.0	61.4	NaN
3	4	48	NaN	0.877369	0.564046	0.010373	88.7	23.8	NaN	0.033306	...	0.015723	43.4	88.9	NaN	0.009142	0.668188	0.003010	1.1	99.7	NaN
4	5	26	NaN	0.574356	0.610789	0.015636	52.1	70.0	NaN	0.812738	...	0.000000	100.0	73.9	NaN	0.375665	0.743471	0.015315	37.1	91.9	NaN

	Community	Size	disease_of_metabolism	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)	disease_by_infectious_agent	p-value.1	...	Sig. Level.135	p_lower (%).135	p_upper (%).135	muscular_dystrophy	p-value.136	{p-value}.136	Sig. Level.136	p_lower (%).136	p_upper (%).136	Unnamed: 824
0	1	1	NaN	1.000000	0.880437	0.000000	100.0	85.9	NaN	1.000000	...	0.000000	100.0	98.9	NaN	1.000000	0.983303	0.000000	100	98.3	NaN
1	2	99	NaN	0.765480	0.529656	0.013399	76.6	32.3	NaN	0.970181	...	0.005178	2.3	99.6	NaN	0.535656	0.606840	0.015771	55	74.1	NaN
2	3	37	NaN	0.684769	0.560950	0.014692	69.7	48.4	NaN	0.009547	...	0.015230	37.4	93.7	NaN	1.000000	0.686668	0.000000	100	49.4	NaN
3	4	17	NaN	0.490462	0.594665	0.015808	48.8	73.4	NaN	0.783821	...	0.012346	19.0	98.3	NaN	1.000000	0.784876	0.000000	100	71.6	NaN
4	5	17	NaN	0.940347	0.603658	0.007490	94.1	24.9	NaN	0.949553	...	0.000000	100.0	79.9	NaN	1.000000	0.797138	0.000000	100	73.2	NaN

	Community	Size	disease	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)
28	29	88	schizophrenia	0.001691	0.549829	0.001299	0.2	99.9
1	2	99	schizophrenia	0.004321	0.547038	0.002074	0.5	99.9
29	30	26	Alzheimer's_disease	0.005374	0.597882	0.002312	0.7	99.8
60	61	24	schizophrenia	0.019591	0.562575	0.004383	1.5	99.8
62	63	23	Alzheimer's_disease	0.035030	0.573961	0.005814	4.3	98.2
60	61	24	Alzheimer's_disease	0.042531	0.593381	0.006381	4.5	98.2
62	63	23	schizophrenia	0.046159	0.589334	0.006635	4.3	98.5

	Community	Size	disease	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)
43	44	35	schizophrenia	0.003446	0.564032	0.001853	0.6	99.7
44	45	73	schizophrenia	0.003900	0.528577	0.001971	0.4	99.9
47	48	17	Alzheimer's_disease	0.007570	0.619867	0.002741	0.2	100.0
32	33	64	Alzheimer's_disease	0.008390	0.563842	0.002884	0.7	99.5
32	33	64	schizophrenia	0.023347	0.541118	0.004775	2.2	98.8
13	14	18	schizophrenia	0.041873	0.590369	0.006334	3.2	99.5
1	2	48	schizophrenia	0.046719	0.563059	0.006674	4.9	98.0
14	15	8	Alzheimer's_disease	0.049793	0.666429	0.006879	5.2	99.2

	Community	Size	disease	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)
25	26	80	Endocytosis	-6.661340e-16	0.638668	NaN	0.0	100.0
44	45	73	Exocytosis	2.220450e-16	0.607270	4.712160e-10	0.0	100.0
39	40	32	CAT_signaling	6.416100e-10	0.672824	8.010050e-07	0.0	100.0
45	46	47	Exocytosis	6.159730e-09	0.632842	2.481880e-06	0.0	100.0
45	46	47	Intracellular_trafficking	1.953090e-07	0.663362	1.397530e-05	0.0	100.0
23	24	9	CAT_signaling	6.450880e-07	0.837461	2.539860e-05	0.0	100.0
32	33	64	Excitability	1.881530e-06	0.694875	4.337650e-05	0.0	100.0
53	54	14	Intracellular_trafficking	5.222120e-06	0.803925	7.226400e-05	0.0	100.0
44	45	73	Neurotransmitter_metabolism	9.475210e-06	0.763295	9.734020e-05	0.0	100.0
7	8	53	Cell_metabolism	1.138280e-05	0.659110	1.066900e-04	0.0	100.0
3	4	48	Structural_plasticity	1.576900e-05	0.603017	1.255740e-04	0.0	100.0
39	40	32	LGIC_signaling	1.663450e-05	0.856139	1.289740e-04	0.0	100.0
1	2	48	Protein_cluster	4.553190e-05	0.654236	2.133770e-04	0.0	100.0
32	33	64	Unknown	4.553640e-05	0.686648	2.133880e-04	0.0	100.0
77	78	15	G-protein_relay	8.115600e-05	0.900729	2.848670e-04	0.0	100.0
45	46	47	Ion_balance/transport	1.457220e-04	0.671435	3.817070e-04	0.0	100.0
46	47	5	G-protein_relay	3.607280e-04	0.970943	6.004980e-04	0.0	100.0
49	50	19	Protein_cluster	5.627940e-04	0.762330	7.499850e-04	0.0	100.0
12	13	84	RPSFB	6.244790e-04	0.619736	7.899930e-04	0.0	100.0
44	45	73	Protein_cluster	6.846600e-04	0.601769	8.271590e-04	0.2	100.0
44	45	73	CAT_signaling	1.170980e-03	0.624930	1.081490e-03	0.1	100.0
43	44	35	GPCR_signaling	1.425310e-03	0.941988	1.193010e-03	0.1	100.0
36	37	2	Intracellular_Signal_Transduction	2.369600e-03	0.901327	1.537530e-03	0.1	100.0
5	6	45	Cell_metabolism	2.620530e-03	0.644941	1.616680e-03	0.2	100.0
1	2	48	G-protein_relay	2.759550e-03	0.794211	1.658900e-03	0.2	100.0
17	18	120	Structural_plasticity	3.187110e-03	0.565784	1.782400e-03	0.3	100.0
23	24	9	Excitability	3.791720e-03	0.920861	1.943540e-03	0.3	100.0
32	33	64	Protein_cluster	1.065260e-02	0.631767	3.246400e-03	1.3	99.7
29	30	1	Excitability	1.081420e-02	0.985162	3.270670e-03	1.5	100.0
42	43	1	Excitability	1.081420e-02	0.988130	3.270670e-03	1.2	100.0
17	18	120	Intracellular_Signal_Transduction	1.149320e-02	0.550901	3.370620e-03	1.4	99.6
14	15	8	Cell_metabolism	1.240010e-02	0.865700	3.499480e-03	1.0	99.7
52	53	31	Intracellular_Signal_Transduction	1.544150e-02	0.630677	3.899110e-03	1.7	99.8
43	44	35	LGIC_signaling	1.572140e-02	0.842228	3.933730e-03	1.6	99.9
34	35	3	LGIC_signaling	1.708820e-02	0.985256	4.098320e-03	1.5	100.0
52	53	31	Ion_balance/transport	1.793900e-02	0.705077	4.197290e-03	2.0	99.9
18	19	12	Ion_balance/transport	1.934710e-02	0.833093	4.355780e-03	1.7	100.0
73	74	1	Cell_metabolism	2.226460e-02	0.978490	4.665720e-03	2.2	100.0
38	39	2	Unknown	2.277690e-02	0.977524	4.717850e-03	2.3	100.0
4	5	26	RPSFB	2.309210e-02	0.715671	4.749620e-03	1.8	99.9
1	2	48	CAT_signaling	2.411220e-02	0.637163	4.850860e-03	1.7	99.9
43	44	35	Ion_balance/transport	2.486020e-02	0.711774	4.923630e-03	1.7	99.8
43	44	35	Intracellular_Signal_Transduction	2.532140e-02	0.610452	4.967920e-03	2.1	99.5
15	16	17	Structural_plasticity	3.152940e-02	0.672475	5.525880e-03	4.3	99.8
32	33	64	Intracellular_Signal_Transduction	3.348920e-02	0.575023	5.689260e-03	3.8	98.7
12	13	84	Cell_metabolism	3.535680e-02	0.614940	5.840100e-03	3.4	99.3
49	50	19	Tyrosine_kinase_signaling	3.584560e-02	0.960434	5.878830e-03	4.1	99.9
5	6	45	Structural_plasticity	3.724010e-02	0.592989	5.987760e-03	3.8	99.4
47	48	17	Ion_balance/transport	3.765970e-02	0.804750	6.020090e-03	3.5	99.8
21	22	1	Structural_plasticity	4.198470e-02	0.966469	6.342080e-03	3.5	100.0
81	82	17	Intracellular_Signal_Transduction	4.681000e-02	0.671723	6.679730e-03	5.7	99.1
61	62	1	Intracellular_Signal_Transduction	4.898220e-02	0.950547	6.825170e-03	5.2	100.0
72	73	1	Intracellular_Signal_Transduction	4.898220e-02	0.956253	6.825170e-03	4.6	100.0

	Community	Size	disease	p-value	{p-value}	Sig. Level	p_lower (%)	p_upper (%)
1	2	99	Exocytosis	-6.661340e-16	0.596752	NaN	0.0	100.0
68	69	68	Endocytosis	-4.440890e-16	0.691935	NaN	0.0	100.0
60	61	24	G-protein_relay	8.690600e-12	0.873672	9.322340e-08	0.0	100.0
28	29	88	Excitability	8.561880e-11	0.662133	2.926070e-07	0.0	100.0
58	59	31	CAT_signaling	4.482160e-10	0.689979	6.694890e-07	0.0	100.0
72	73	13	Protein_cluster	1.814150e-09	0.816884	1.346900e-06	0.0	100.0
1	2	99	Intracellular_trafficking	2.364650e-07	0.597795	1.537740e-05	0.0	100.0
28	29	88	Protein_cluster	8.989080e-07	0.615212	2.998180e-05	0.0	100.0
16	17	5	Structural_plasticity	1.376690e-05	0.845397	1.173320e-04	0.0	100.0
3	4	17	Cell_metabolism	2.083410e-05	0.764369	1.443390e-04	0.0	100.0
6	7	92	RPSFB	3.503870e-05	0.599114	1.871830e-04	0.0	100.0
1	2	99	Neurotransmitter_metabolism	4.320860e-05	0.722620	2.078620e-04	0.0	100.0
69	70	3	Structural_plasticity	7.081230e-05	0.904585	2.660960e-04	0.0	100.0
61	62	24	Unknown	1.111100e-04	0.821977	3.333130e-04	0.0	100.0
71	72	44	CAT_signaling	4.116570e-04	0.656179	6.414730e-04	0.0	100.0
15	16	49	Structural_plasticity	7.192340e-04	0.580705	8.477720e-04	0.0	100.0
28	29	88	LGIC_signaling	9.317240e-04	0.737109	9.648090e-04	0.0	100.0
27	28	19	Unknown	1.090500e-03	0.831836	1.043700e-03	0.0	100.0
54	55	2	Intracellular_Signal_Transduction	2.369600e-03	0.914893	1.537530e-03	0.1	100.0
58	59	31	Intracellular_Signal_Transduction	3.135030e-03	0.626750	1.767820e-03	0.1	99.9
22	23	71	Cell_metabolism	3.932760e-03	0.616346	1.979220e-03	0.0	100.0
62	63	23	Intracellular_Signal_Transduction	4.158490e-03	0.645144	2.034990e-03	0.2	100.0
30	31	3	Tyrosine_kinase_signaling	5.717900e-03	0.996023	2.384370e-03	0.4	100.0
26	27	33	CAT_signaling	6.526000e-03	0.688906	2.546260e-03	0.4	100.0
62	63	23	LGIC_signaling	6.928630e-03	0.884490	2.623100e-03	0.7	100.0
4	5	17	RPSFB	7.016560e-03	0.756378	2.639570e-03	0.5	100.0
62	63	23	Ion_balance/transport	7.804610e-03	0.726029	2.782750e-03	0.5	100.0
61	62	24	Ion_balance/transport	8.809520e-03	0.747960	2.954980e-03	1.4	100.0
58	59	31	LGIC_signaling	1.243410e-02	0.888333	3.504200e-03	0.9	99.9
67	68	1	Endocytosis	1.335880e-02	0.987174	3.630470e-03	1.3	100.0
28	29	88	CAT_signaling	1.478950e-02	0.600355	3.817170e-03	1.0	99.8
32	33	3	LGIC_signaling	1.708820e-02	0.981325	4.098320e-03	1.9	100.0
60	61	24	CAT_signaling	1.728890e-02	0.720771	4.121890e-03	1.5	100.0
77	78	11	Intracellular_trafficking	2.088610e-02	0.834183	4.522150e-03	1.5	100.0
60	61	24	Exocytosis	2.133330e-02	0.709442	4.569270e-03	1.8	99.9
47	48	1	Cell_metabolism	2.226460e-02	0.981423	4.665720e-03	1.9	100.0
82	83	1	Cell_metabolism	2.226460e-02	0.976534	4.665720e-03	2.4	100.0
34	35	2	Unknown	2.277690e-02	0.976547	4.717850e-03	2.4	100.0
13	14	95	RPSFB	2.376770e-02	0.608652	4.816930e-03	2.2	99.6
66	67	1	Exocytosis	2.544530e-02	0.972712	4.979740e-03	2.8	100.0
19	20	87	Structural_plasticity	2.600320e-02	0.567603	5.032600e-03	2.6	98.7
22	23	71	Structural_plasticity	2.619760e-02	0.574707	5.050870e-03	2.1	99.3
13	14	95	Intracellular_Signal_Transduction	3.836680e-02	0.568902	6.074110e-03	4.3	98.0
14	15	16	Intracellular_Signal_Transduction	3.992310e-02	0.693810	6.191060e-03	3.2	99.8
62	63	23	GPCR_signaling	4.328120e-02	0.964559	6.434900e-03	3.7	99.9
63	64	23	GPCR_signaling	4.328120e-02	0.963602	6.434900e-03	3.8	99.9
68	69	68	Neurotransmitter_metabolism	4.361520e-02	0.783897	6.458550e-03	4.1	99.7
60	61	24	GPCR_signaling	4.513410e-02	0.959851	6.564830e-03	4.2	99.9
14	15	16	Cell_metabolism	4.753880e-02	0.776354	6.728960e-03	3.3	99.9
43	44	1	Intracellular_Signal_Transduction	4.898220e-02	0.942939	6.825170e-03	6.0	100.0
40	41	1	Intracellular_Signal_Transduction	4.898220e-02	0.943890	6.825170e-03	5.9	100.0